Enhancing the ability to tell “where” acute malnutrition is of concern

Results from a quest using probaility proportional to the size of the populaton-based survey data

Tomás Zaba

2025-03-28

Introduction

Why Spatial Analysis?


  • To provide actionable information on where.

  • Currently, when an analysis is done, the whole polygon is classified as one, hidding the spatial variation of acute malnutriton.

  • Currently protocols try to address this need by:

    • Guiding users to disaggregate surveys when DEFF is >= 1.3 (heterogenous distribution across surveyed areas)
    • This is includes during the FRC reviews
  • However, still not effective and efficient in providing actionable information to the IPC end-users.

  • It does not inform targetting (of resources and reach of those of uptmost need of treatment)

We need a spatial dimension…

…Why?

  • Our analysis inform life-saving interventions, particularly in high-vulnerable countries;
  • Countries struggle to identify the most in need population that should be served first.
    • Different strategies are adopted to define within district/county/etc targetting.
    • These are based on empirical knowledge, not evidence-based.

A few examples:

Somalia: The Operational Priority Areas (OPA)

  • Step 1: ranking of districts with the highest IPC Phase to lowest
  • Step 2: withing district targeting is further based on two criteria: (i) most common vulnerable livelihood zones; (ii) high population density zones.

Mozambique: Farthest Communities Are Most Vulnerable

  • Targeting involves selecting the communities located farthest from the district center, based on the assumption that these areas have limited access to basic services and are therefore the most vulnerable.

So what…?

  • …while the strategies are logical, is not always true and predictable due to the complexity and wide-ranging factors that leads to acute malnutrition.

  • Enhancing the ability to pinpoint where acute malnutrition is a concern within the IPC framework would significantly strengthen its role in providing life-saving information.

On this regard, I conducted an operational research that consisted in:

  • Predict the prevalence of acute malnutrition in unsurveyed or unsampled locations by leveraging data from surveyed locations → spatial interpolation.

Based on the first law of geography:

“Nearby things are more similar than distant things.”

Toble W. (1970)


We do apply this law in our protocols -> protocols for similar areas

Questions 🧐

  1. Does spatial interpolation produce reliable (precise and accurate) estimates using small scale survey data, such as district level surveys?


  1. How comparable the predicted estimates can be against the observed prevalence estimates of the original survey results?

Data & Preparation

Data Source

Two exampe datasets used:


  1. Nine district-level SMART surveys conducted in 9 districts of Karamoja Region, Uganda.
    • Data collected in April 2021


  1. Locality-level SMART surveys conducted in North Darfur, Sudan
    • Data collected in October 2024 (See the other presentation on this)

Data Wrangling

Aspatial

flowchart TD

A(WFHZ)
B(MUAC)
C(Exclude rows with missing GPS coord.)
D(Calculate WFHZ and define AMN)
E(Remove outliers)
F(Calculate MFAZ and define AMN)
G(Remove outliers)
H(Get % aggregated at cluster ID)

A --> C --> D --> E --> H
B --> C --> F --> G --> H

Spatial

flowchart TD

A(Set CRS and/reproject CRS)
B(Get mean GPS coord. by cluster ID)
C(Calculate spatial weights)
D(Smooth rates)
E(Krige)

A --> B --> C --> D --> E

Assessment of Model-fit

Cross-validation: leave-one-out resampling method

source: ArcGIS Pro


How does it work?

  • After estimating the interpolation model from all blue points, the value of the red point is hidden, and the remaining points are used to predict the value of the hidden point. The prediction is then compared to the measured value. This process repeats for all 10 points.

Results

Spatial Variation of GAM by WFHZ

Survey sampling points

Predicted surface map

Choropleth map: County

Choropleth map: District

Predicted Estimates of GAM by WFHZ

Observed district prevalence estimates vs predicted prevalence
district Observed prevalence (%) Predicted prevalence (%) bias Minimum prevalence (%) Maximum prevalence (%) Median prevalence (%)
Abim 6.28 9.46 3.19 0.60 11.31 6.52
Amudat 10.04 12.39 2.34 3.14 15.23 8.80
Kaabong 18.07 6.50 -11.57 0.93 29.76 13.42
Karenga 8.19 4.44 -3.75 0.31 22.31 6.19
Kotido 8.01 7.76 -0.25 0.93 15.69 6.86
Moroto 11.85 11.59 -0.26 0.00 28.15 10.56
Nabilatuk 7.50 5.02 -2.48 0.00 23.04 9.21
Nakapiripirit 7.26 8.47 1.22 0.00 18.41 8.77
Napak 7.77 8.46 0.69 0.00 18.84 7.98

Did the Model Fit the Data?

Predicted rates in the cross-validation results against the observed rates


R² = 0.806

  • Positive and strong correlation

Spatial Variation of GAM by MUAC

Survey sampling points

Predicted surface map

Choropleth map: County

Choropleth map: District

Predicted Estimates of GAM by MUAC

Observed district prevalence estimates vs predicted prevalence
district Observed prevalence (%) Predicted prevalence (%) bias Minimum prevalence (%) Maximum prevalence (%) Median prevalence (%)
Abim 4.20 2.41 -1.79 0.00 22.77 4.25
Amudat 2.57 1.42 -1.15 0.00 13.13 2.55
Kaabong 22.36 15.78 -6.58 5.18 33.61 19.33
Karenga 10.03 11.93 1.90 2.56 22.39 9.92
Kotido 15.61 17.49 1.88 3.27 32.75 17.75
Moroto 14.76 16.36 1.60 0.33 27.16 15.63
Nabilatuk 10.59 11.75 1.16 0.51 27.16 11.23
Nakapiripirit 12.62 12.76 0.15 1.54 18.39 12.82
Napak 8.30 10.20 1.89 3.99 28.40 9.07

Did the Model fit the Data?

R² = 0.836

  • Positive and strong correlation

Uncertainty

What influences high uncertainty?

Standardized Prediction Standard Errors

\(Zscore = \frac{\text{Prediction} - \text{Observed Value}}{\text{Kriging Standard Errors}}\)

GAM by WFHZ

GAM by MUAC

Interpretation

  • Z = 0: prediction is exactly equal to the observed value
  • Positive Z: prediction is higher to the observed value.
  • Negative Z: prediction is lower to the observed value.

It basically tells how many standard deviations away the predicted value is from the observed value.

  • ‘> -3 Z < -3’ 👍

Limitations


  • The more sampling points are clustered together (as seen in Kaabong district) rather than spatially dispersed (as in Nakapiripirit district), the higher the uncertainty. This is due to large areas without direct observations, forcing the model to extrapolate based on nearby values. In such cases, predictions in unsampled locations rely on neighboring observations from the model’s previous estimates.

  • The tendency for sampled clusters to be concentrated in specific geographic areas rather than evenly distributed across the surveyed region is an inherent characteristic of PPS-based surveys. These surveys allocate more clusters to densely populated areas and fewer to sparsely populated regions.

    • As a result, varying levels of confidence in predictions should be expected when performing spatial interpolation with PPS-based surveys. Unlike surveys using spatial sampling methods, PPS-based surveys do not ensure an even spatial distribution of clusters, leading to areas with higher uncertainty in predictions.

Actionable Insights


  • Based on the results, spatial interpolation using PPS-based survey data (e.g., SMART) appears to generate reliable estimates for decision-making.
    • However, this could be due to chance. Further validation with additional data is necessary.

Actionable Insights for Standard IPC analyses


  • The results may provide a better approach for identifying hotspots and guiding program targeting.

  • Predicting results at lower administrative levels could be a breakthrough in estimating the number of children in need of treatment. This is particularly relevant when surveys are conducted at higher administrative levels, such as in Somalia, where assessments are done by livelihood zones rather than districts or counties.

  • This approach could also offer a more effective alternative to the current method used in IPC AMN protocols, where surveys must be disaggregated when the design effect (DEFF) exceeds 1.3, requiring at least five clusters and 100 observations.

Actionable Insights for FRC Reviews


Results appear/may be a better solution for looking into/highlighting hotspots and inform programme targeting.

  • Possible advantages
    • Affected countries would be able to tell where to prioritize/target.
  • Possible disadvantages
    • I do not see relevant compared to the advantages.

Actionable Insights for Risk Analysis


By highlighting areas that are more affected than others,

  • It provides a clear indication of regions on the brink of crossing IPC AMN Phase 5 thresholds, enabling increased monitoring of risk factors.

  • It helps identify locations where localized SMART surveys may be necessary for a more precise assessment.

Next steps


The approach needs to be validated with more data.

  • SMART or other representative survey data.
    • Can be district/county/locality-specific survey or
    • Can be a regional/province/higher administrative level survey with interpolation to lower admin levels (e.g., the case Belihu mentioned yesterday)
    • Could also be tested using South Sudan FSNMS data, where only a few (9) clusters per county are sampled, but data is aggregated at the state or domain level for analysis.
  • Sentinel sites data:
    • Available options:
      • Kenya NDMA sentinel site data, which includes GPS coordinates. (Access to this data must be requested from the Kenya NDMA authority.).
  • Explore alternative modeling approaches to improve estimates:
    • Gaussian process regression
    • Model actual weight-for-height or MUAC data instead of GAM rates (both in Kriging and Gaussian process models)